Feature selection helps reduce data acquisition costs in ML, but the standard approach is to train models with static feature subsets. Here, we consider the dynamic feature selection (DFS) problem where a model sequentially queries features based on the presently available information. DFS is often addressed with reinforcement learning (RL), but we explore a simpler approach of greedily selecting features based on their conditional mutual information. This method is theoretically appealing but requires oracle access to the data distribution, so we develop a learning approach based on amortized optimization. The proposed method is shown to recover the greedy policy when trained to optimality and outperforms numerous existing feature selection methods in our experiments, thus validating it as a simple but powerful approach for this problem.
translated by 谷歌翻译
Relation Extraction (RE) has been extended to cross-document scenarios because many relations are not simply described in a single document. This inevitably brings the challenge of efficient open-space evidence retrieval to support the inference of cross-document relations, along with the challenge of multi-hop reasoning on top of entities and evidence scattered in an open set of documents. To combat these challenges, we propose Mr.CoD, a multi-hop evidence retrieval method based on evidence path mining and ranking with adapted dense retrievers. We explore multiple variants of retrievers to show evidence retrieval is an essential part in cross-document RE. Experiments on CodRED show that evidence retrieval with Mr.Cod effectively acquires cross-document evidence that essentially supports open-setting cross-document RE. Additionally, we show that Mr.CoD facilitates evidence retrieval and boosts end-to-end RE performance with effective multi-hop reasoning in both closed and open settings of RE.
translated by 谷歌翻译
通过网络视频的快速增长,视频语言建模引起了很多关注。大多数现有方法都假定视频帧和文本描述是语义上关联的,并专注于视频级别的视频模型。但是,该假设通常是有两个原因的:(1)凭借视频内容丰富的语义,很难用单个视频级别的描述覆盖所有帧; (2)原始视频通常具有嘈杂/毫无意义的信息(例如,镜头,过渡或预告片)。尽管最近的许多作品部署了注意力来减轻此问题,但无关/嘈杂的信息仍然使得很难解决。为了克服此类挑战,我们提出了一个高效有效的模型,称为语言引导网络(LGDN),用于视频语言建模。与使用所有提取的视频帧的大多数现有方法不同,LGDN在语言监督下动态过滤了未对准或冗余的帧,并且每个视频仅获得2---4个显着帧,以进行交叉模式令牌级别的对准。在五个公共数据集上进行的广泛实验表明,我们的LGDN优于最先进的利润率。我们还提供了详细的消融研究,以揭示解决噪声问题的关键重要性,以启发未来的视频语言工作。
translated by 谷歌翻译
多模式学习,尤其是大规模的多模式预训练,在过去的几年中已经迅速发展,并带来了人工智能(AI)的最大进步。尽管具有有效性,但了解多模式预训练模型的潜在机制仍然是一个巨大的挑战。揭示此类模型的解释性可能会使AI领域中新型学习范式的突破。为此,鉴于人脑的多模式性质,我们建议借助非侵入性脑成像技术(例如功能磁共振成像(fMRI))探索多模式学习模型的解释性。具体而言,我们首先提出了1500万个图像文本对预训练的新设计的多模式基础模型,该模型在各种认知下游任务中显示出强烈的多模式理解和概括能力。此外,从神经编码的角度来看(基于我们的基础模型),我们发现,与单峰相比,经过多模式训练的视觉和舌编码器都更像脑状。特别是,我们确定了许多大脑区域,其中多模式训练的编码器表现出更好的神经编码性能。这与现有有关探索大脑多感觉整合的研究的发现是一致的。因此,我们认为,多模式基础模型是神经科学家研究人脑中多模式信号处理机制的更合适的工具。我们的发现还证明了多模式基础模型作为理想的计算模拟器的潜力,以促进脑和大脑的AI研究。
translated by 谷歌翻译
通过机器学习学习个性化的癌症治疗,可以提高癌症患者生存的机会。尽管机器学习和精确肿瘤学的最新进展,但这种方法仍然具有挑战性,因为在临床前/临床研究中收集数据以建模多种治疗效率通常是一个昂贵的,耗时的过程。此外,由于某些参与者/样本在试验期间未接受最合适的治疗方法,因此治疗分配的随机分配被证明是次优的。为了应对这一挑战,我们将药物筛查研究作为“上下文匪徒”问题,其中算法根据有关癌细胞系的上下文信息选择抗癌治疗剂,同时调整其治疗策略以最大程度地以“在线”方式以最大化治疗反应。我们建议使用一种新型的深贝叶斯土匪框架,该框架在近似后验之前使用功能,以基于由基因组特征和药物结构组成的多模式信息进行药物反应预测。我们对三个大规模的体外药物基因组学数据集进行了经验评估我们的方法,并表明我们的方法在识别给定细胞系的最佳处理方面优于几个基准。
translated by 谷歌翻译
Two key obstacles in biomedical relation extraction (RE) are the scarcity of annotations and the prevalence of instances without explicitly pre-defined labels due to low annotation coverage. Existing approaches, which treat biomedical RE as a multi-class classification task, often result in poor generalization in low-resource settings and do not have the ability to make selective prediction on unknown cases but give a guess from seen relations, hindering the applicability of those approaches. We present NBR, which converts biomedical RE as natural language inference formulation through indirect supervision. By converting relations to natural language hypotheses, NBR is capable of exploiting semantic cues to alleviate annotation scarcity. By incorporating a ranking-based loss that implicitly calibrates abstinent instances, NBR learns a clearer decision boundary and is instructed to abstain on uncertain instances. Extensive experiments on three widely-used biomedical RE benchmarks, namely ChemProt, DDI and GAD, verify the effectiveness of NBR in both full-set and low-resource regimes. Our analysis demonstrates that indirect supervision benefits biomedical RE even when a domain gap exists, and combining NLI knowledge with biomedical knowledge leads to the best performance gains.
translated by 谷歌翻译
Optimization in multi-task learning (MTL) is more challenging than single-task learning (STL), as the gradient from different tasks can be contradictory. When tasks are related, it can be beneficial to share some parameters among them (cooperation). However, some tasks require additional parameters with expertise in a specific type of data or discrimination (specialization). To address the MTL challenge, we propose Mod-Squad, a new model that is Modularized into groups of experts (a 'Squad'). This structure allows us to formalize cooperation and specialization as the process of matching experts and tasks. We optimize this matching process during the training of a single model. Specifically, we incorporate mixture of experts (MoE) layers into a transformer model, with a new loss that incorporates the mutual dependence between tasks and experts. As a result, only a small set of experts are activated for each task. This prevents the sharing of the entire backbone model between all tasks, which strengthens the model, especially when the training set size and the number of tasks scale up. More interestingly, for each task, we can extract the small set of experts as a standalone model that maintains the same performance as the large model. Extensive experiments on the Taskonomy dataset with 13 vision tasks and the PASCAL-Context dataset with 5 vision tasks show the superiority of our approach.
translated by 谷歌翻译
Masked language modeling (MLM) has been widely used for pre-training effective bidirectional representations, but incurs substantial training costs. In this paper, we propose a novel concept-based curriculum masking (CCM) method to efficiently pre-train a language model. CCM has two key differences from existing curriculum learning approaches to effectively reflect the nature of MLM. First, we introduce a carefully-designed linguistic difficulty criterion that evaluates the MLM difficulty of each token. Second, we construct a curriculum that gradually masks words related to the previously masked words by retrieving a knowledge graph. Experimental results show that CCM significantly improves pre-training efficiency. Specifically, the model trained with CCM shows comparative performance with the original BERT on the General Language Understanding Evaluation benchmark at half of the training cost.
translated by 谷歌翻译
This paper provides an introductory survey to GPT-3. We cover some of the historical development behind this technology, some of the key features of GPT-3, and discuss the machine learning model and the datasets used. We survey both academic and commercial efforts applying GPT-3 in diverse domains such as developing conversational AI chatbots, software development, creative work, domain knowledge, and business productivity. We discuss some of the challenges that GPT-3 faces such as the problems of training complexity, bias, and hallucination/incorrect answers. We also discuss the future research opportunities in this area.
translated by 谷歌翻译
Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training which distributes the workload of training across multiple computing nodes. However, the workflows, computational patterns, communication patterns, and optimization techniques of distributed GNN training remain preliminarily understood. In this paper, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks, emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.
translated by 谷歌翻译